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and PowerPC workstations using MPI. The increased accuracy of these sim- 
ulations was verified for the Eugene Island natural gas reservoir, located off 
the coast of Louisiana. The results for the Eugene Island example are more 
accurate than results from previous finite difference solutions for the same 
simulation. 



On-Une nutbemaikal ezprenkm reeognliion asing flexible strue* 
tnral m^*^^**^ mad bicmcUeal deeos^odilon paning. Chan, 
Kam>Fai, Ph.D. Hong Kong University of Science and Technolo gy (Pw - 
pie's Rep. oj China), 1998. 155pp. Order Nmnbcr DA9933219 

With the recent advances in pen-based computing technologies, we al- 
ready have all the necessary hardware to provide an input device for en- 
tering mathematical expressions into computers in a natural way, i.e., we 
simply write the expressions on an electronic tablet for the computer to 
recognise them automatically. The key problem that remains is of course 
the automatic recognition of mathematical expressions, which is more on 
the software side. 

Mathematical expressions are generally two-dimensional: struct ural/pat- 
terns. They typically consist of special symbols and Greek letters in a^ddi- 
: ti<>n to English letters and digits. Mor^ver, characters aiid symbolslitnay 
appear in various positions, possibly of different sizes. All: these togc^iher . 
make the recognition proce^ very complicated even when all the individual 
chara<kers and j^mbpb' ca^ 

iMath^ihatical expressio^^^^ consists of two major stages: symftdf 

wecogniHon and structural analysis. Character recognition, as the most 
common type of symbol recognition problems, has been an active research 
area for more than; three ^decades. Structural analysis of, two-dimensional 
patterns also has a king history. However, very few papers have/juldtessed 
specific problems related to mathematkral expression recognition. 

In this these, we tacrkle yarbus inuei related to mathematical expression 
recognition. In partkular, we propose two methods to solve problems in 
different stages of the recognition process, i.e., flexible structural matching 
for symbol recognition and hierarchical decomposition parsing for structural 
analysis. In addition; we incorporate some error detection and correction 
mechanisms in both stages so that the overall recognition rate can be tmr 
proved. To show the effectiveness of the proposed methdd8,.we also suggest 
some schemes for evaluating recognition performance. 

Experiments have been done on 600 mathematical expressions written by 
10 writers. The results show that the recognition rates obtained are fairly 
. high and the recognitbn speed for a single expression ranges from 0.73 secr 
: end to 6 seconds over different sizes of expressbns, with the sysUm running 
in Prok>gon a modest Stin SPARC 10 Una workstation. This makes. math- 
ematical expressioh recognition more feasible for real-worki applications. . 



ddedioii imdcr dependence. Chandramouli, Ra- 
jarathnam, Ph.D. C/fitt'er5i<y of South Florida, 1999. 94pp. Adviser: , N. 
Rangi^nathan Older Nmnbcr DA99|2424 

. :i^o„p^ f(v Markov-dependent observations 

aire stvidiedjn thB:!flisseitatk>n. -Fii^^^^ ami discrete atmplitu 

dbserv regular Markov chain are. c^nsid- 

era^/ Tlie null and/the aHoriute hypothes^ to^ two pdasi|nities 

ifoir the trai»tioktj>^ niatnx of the Markov chaiiil v Tjhie s^[^e^^^^ 

detector rperforn^ seqiiential hyjpothesis te^ to decide betiireen tl^ two 
fayi^hesesL The teiA atat be a correlate raiidom walk. 

The results derived incliitte^he termination of the test, geiieral- 

izatk>ii of Wald's first and second lemmas, a fcvmula for the average sample 
number re(|uired 1^ the tesA, km^ bounds for the average sam- 

ple numb^, proof of the asymplotkr effkriency, and other asymptotic resulte 
related to the moments of the random walk. When the ob8ervatk>nB are 
poeHivety ooirelated H is obeerved that the proposed sequential test results 
in a signi^icaik rediKtioh in the average sample number compared to the 
scenarb when the pbeervatbns are assumed to be independent. For inde- 
pendent, identically distributed observations, it is seen that the proposed 
test redaces to the cbancal eequential sign test. 

For the continuous-time Marhov-dependent obeer/atk>ns with continuous 
state space, the sequential detector is derived from the diffusion approxi- 
mation of its discrete cottnter*pait. Here, the test is between the transition 
rates of the Markov process. Both, the unrestricted and the restricted ob- 
servaibn pracea w a are consklered. The observed stochastic process is a 
vcrsM>n of a Poiason pr oc ea a with reversals. For the unresukted proceas, it 
is shown that iu probability density function (pdf ) satisfies a generalized 
version of the telepapher's equation. A ck>aed-form aolutbn to the pdf is 
derived. Under certain limiting conditk>ns it is observed that this process 



reduces to the Wiener proceas. The sequential detector is derived by consid- 
ering the observed pre xss to be restricted by two absorbing barriers. When 
a barrier is crossed the corresponding hypothesis is accepted. The probabil- 
ity density function of the restricted process is derived using the method of 
Fourier series. The momente of the stopping time of the restricted process 
are shown to satisfy an ordinary differential equation. Vtom this, the ex* 
pected stopping time of the test is derived explicitly. An upper bound is 
also given. 
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A cnacndfid diitribiiled qneae aslirarik Chang, caien*hung, Ph.D. , 
Illinois InsHtuie of Technology, 1998. 180pp. Adviser: Graham Campbell 

Older Nindbcr DAM2137C 

In this dissertatbn, we start from reviewing the Di^ibuted Quetung 
Switching Architecture (DQSA) family and then introduce one new mem- 
ber, CDQ (Cascaded Distributed Queue Network). 

When moving to a larger interleaving environment liece^ry when little 
a** >> i, the cdnventksnal protocols in DQSA family siiffcr the giroblems 
of increased' iMxeas time, mutual usage of the data channel, and iiicreased 
deiayl The hew' member^ CDQ, is a new DQSA yariatraii. By separating 
a large network into several smaller geographic sj^m^ni^^^^ 
a hybrid mechanism to fontrard packed brtv^n Segihe 
problems that emerge when serving large aireas; 

This hybrid packet forwtfdih^ which combines tluee subr 

mechanisms, dram^ access and delay time. When 

the offered bad oh each s^irnent is tinder 100% date cbanneVs capacity, 
the access time of k>ng travel packete is reduced to same as the packete 
of local traffic. And the delay time is only the sum of its access time, ite 
propagation delay i^^ tb destination, and very stu>it waiting delay 

on each h6p to be fbrwiarded to next segmient. The waitihg delay is only a 
little higher thaln a M/D/l In average even the "little a^ is higher than 1000. 

(^nmdering that system's offered load may be' higher than its physi- 
cal capacity, thb dissertation aiialyzes CDQ under this^^circumstance. A 
new flow, control ihTChani^m is' then developed to overcome the congestion 
problem. >Ve introduce TVo-stage Random Access Flow Control (TRAFC) 
to promote CDQ tb be a Unless data-link protocol to citrry data packets 
across networks. We also introduce and analyze different implementation 
variation and throttle algorithms. SimuUtion resulte are shown. 



irifli new cliii d iattf i eij r * ! Chang, Inbao/ D.Sc. 
ne George Washington Untuersiiy, 1999. lOSpp. Dtirector: Murray Howard 
Loew XMgtfivaaBherDAnmp 

This paper discusses classification problems with no^n-mutually-excluave . 

classes. A new method,. New Class Dracovery (NCD), is proposed whtch .. 

can discover potentially unknown or untrained classes. It is formally proved 

tha(t, whoe there are non^mutually exclusive classes, the error probabili|ty 
^of the NCD metfaodis less than the conventbnal classification method umng 

Bayes* rule, by at least J^p (A'|jr^) p (ir') iX, where i^? is the n^imensioiDal 

region <»vered by the new class if*, and p (3r|ir^)^fe^he:den^^ function of 
^'the'new^class/: ■ ^y- •■- Vty-^ v^' :-/ ••: .j^^^v ;^/;j^- v ^ 

> v-Expieriinents ' weie- .conducted usihgo both machine ;^geneHfted; ; d^ ; 
■^ttte^li-kn^ result <demonsteated ii^at the NCP method 

haSi^stgniiiica^ly 'ky^ that of ; a Bayes classifier 

similar iitttation;;^nd^^ 
f cias patterns whenttlassies aize^n6n-mutually-«xcluisive;Or ik>d» 
: ' Applicatbn of the NCD teal entomok^iea] daita Wa^>kboe5^ 

Aful: a new species; w^^ independently: wfiBed by; 

•'taixonomijats* '"li'^ ■ - ■ ■ '■ " 

The NCD ihethpd atop can provider brklge between statistKal pattern 

recognhion (SPRV methods in feature space, and artiffeial intelligence (AI) 

niethods applied in logical spiace. 

Aiitoiiiatfaign^^ tot cM se iafte lueaaufemenft and 

test fldedinn. Chang, Juei, Ph.D. University of CaHfomia, /rpt ne, 199 9. 
I78pp. Chair Debra J. Richardson Otder NmdMer DAMmSS 

Software devek>pment is an error-prone process. Testing is the primary 
tool for detecting defecto in the a>ftware and has tremendous impact on the 
quality of the devek>ped product. Testing the implementetkin of the system 
atone is often not adequate since it does not teke into account w'lat the 
system is supposed to do. Testing should also involve the system's spec- 
ificatlon. Incorporating 8pecificatk>n*ba8ed testing into the devebpment 
process helps remove inconsistencies between the implementation and the 
specification. This disaertatton provkles contributions in the area of speci- 
ficattombased testing. 
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Abstract 

This paper describes the design and the first steps of 
implementation of Ofr (Optical Formula Recognition), 
a system for extracting and understanding mathemat- 
ical expressions in printed documents. Our approach 
clearly separate OCR step, geometrical treatments and 
syntactic analysis. In this paper we focus on the third 
part: we define a class of context-sensitive graph gram- 
mars for mathematical formulas^ study their proper- 
ties and show how to remove their ambiguities (by 
adding contexts in rules) to define efficient parsing* 
This method is based on a "critical pairs" approach in 
the sense of Knuth-Bendix algorithm. 

Introduction 

The paper is organized as follows : first we in- 
troduce the domain of mathematical formulas recog- 
nition, and discuss works of the literature. In seo 
end part, we briefly present our objectives and the 
motivations of our researches. In third section, we 
present the methodology we have used. The fourth 
part present the formalism of our graph grammars, 
properties allowing to eliminate ambiguities, and show 
how to use them to^parse formulas. The fifth part de- 
scribes briefly the implementation. Then, we conclude 
in describing future works. 

1 Mathematical formulas recognition 

There is a wealth of mathematical knowledge that 
can be potentially very useful in many computational 
applications. But this material is not available in elec- 
tronic form. All this knowledge is in mechanical, phys- 
ical and mathematical books which are references in 
the domain from many years. Other newer sources 
are publications, articles, filled of useful informations 
which are often difficult to get sources. Actually, the 
only way to use this mathematical informations is to 
re-type formulas on keyboard to be able to add it in 

"also I3S, CNRS UKA 1376, Universitfi de Nice Sophia 
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Computer Algebra Sj^tem (CAS) or in any applica- 
tion using mathematical input. For automatization of 
this job, the problem to solve is : How to build the synr 
tax tree of a formula just with graphical informations 
(characters and their positions) ? 

Many works have been done since the sixties on 
parsing two dimensional expressions. Parsing 2D ex- 
pressions is more difficult than parsing strings because 
mathematical expressions, or other 2D languages, dif- 
fer greatly from text. A line of text is one-dimensional 
and discrete: characters are placed one after another 
on the same line. But symbols in mathematical ex- 
pressions may be under, upper on the right and far, 
included in another^ etc, with continuous distances. 

Difiierent methods were used to solve this problem 
with various success ([1], |3], [6], [8], [10], [13],I14]) 
which can roughly be classified in: syntax directed 
recognition ("local" context-free grammar approach 
and heuristics) and geometry directed recognition us- 
ing layout structure of symbols. 

This methods have problems to manage *'non- 
linear" formiilas like matrices or s^'stems of equations. 
Also, parsing techniques in literature have often expo- 
nential complexity, when they use 2D grammars de- 
rived firom classical context free string grammars, with 
geometrical predicates. Exploration and backtracking, 
used in these parsers, lead to exponential complexity. 

2 Objectives 

Our aim is to start fi*om scanned images {^itmaps'^ 
of documents containing formulas and to extract, read 
and parse them to be able to re-use them in other ap- 
plications. As we mentioned in introduction, it would 
be very interesting to have such a tool. Different 
fields of applications can be considered : build a base 
of knowledge (Database of mathematical formulas or 
CAS database), copy/paste between a viewer of Post- 
script and a CAS or an editor, complete actual pro- 
fessional OCR which do not have formula recognition 
capabilities. 
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3 Design of Ofr 

The Ofr system described in this paper is based on 
three different components, each one giving just the 
necessary information to the next process. 



LatrnUt Cranoaa/ndn 




Figure 1: The Ofr architecture 



3.1 OCR (Optical Character Recognition) 
The OCR step is important, and researches on this 

subject have lead to good solutions, at least for printed 
characters. In this paper, we won't discus about the 
recognition problem. We assume that their is a proems 
whidi gives as output symbols of the sheet and infor- 
mations about them. We need the bounding box of 
each symbol (position in absolute or relative coordi- 
nates), size of the character. For size, we don't sup- 
pose that OCR can give the absolute size in points, 
but a relative size between all characters. Eventually 
reference point of the character could be helpful. 

3.2 Definition of graphs 

We introduce an intermediate combinatorial struc- 
ture, between recognized symbols with their positipps 
in the plane, and the tree of formula. It is a graph, 
linking graphical objects of the paper sheet. This 
graph contains geometrical informations about all the 
elements of formula and their relative positions. 

Extending the usual methodology of string lan- 
guages analysis, we use the notion of lexical units, or 
token. In our case, a token is basically a S3anbol of the 
sheet, £Lnd will be more complicated expression during 
the parsing procesa. 

The graph builder constructs a graph with all to- 
kens. Oriented links between vertices are deduced 
from graphical informations. Theses graphic oriented 
links are intended to capture all useful geometric in- 
formations of character relative positions. In fact, this 
step is a generalization of the only two links before and 
a/terdetermining relative character position in a com- 
puter input string. Of course, the construction of the 
graph is difficult, but we think that the separation of 
geometric and syntax is very important to understand 



and solve the problem of parsing 2D expressions, like 
mathematical ones. 

Every object will be represented by terms or finite 
sets of terms. The set r(F, V) of terms is inductively 
defined by the set F of functional symbols of fixed ar- 
ity and the set V of variables: variables are terms, and 
if ^1 , .., tn are terms, and / is a n-ary functional sym- 
bol of F, then /(<i,..,^n) is a term. We note VaT{t) 
the set of variables occurring in a term (or a finite set 
of terms) t. We use uppercase symbols for functional 
symbols and lowercase ones for variables. 

• a vertex is a term V(<,v,i) where: t is lexical 
type ("Letter", "Digit", etc), v is value (typi- 
cally a mathematical expression in term form : 
£, Mu/t(2,y), etc), i is an identifier. 

• an edge is a term E{t^ ^1,^2) where: v\ and V2 are 
vertices, t is type of edge, i.e. a term L(d,ii;), d 
being a graphical directions ("Left", "Top", etc), 
and w being a weight, encoding relative proximity 
of two symbols. 

• a graph is a finite set of 
edges: {£?(ti,vii, 112,1), ..,JE;(tn,vin,V3,n)}. The 
set {vij } being the set of vertices of the graph. For 
simplicity, we suppose that graphs are connected 
and have at least one edge ^. 

The next step is now to define a type of grammar and a 
parsing method to use this combinatorial structure in 
order to derive tree (term) representations of formulas. 

4 Structure analysis : graph grammar 

Graph grammars provide a useful formalism to de- 
scribe structural manipulations of multi-dimensional 
data. They were introduced in (11] to solve picture 
processing problems, and are studied in a theoretic 
point of view [12], or in a more practical one [2], [4]. 
To have a good overview of this subject, see [5], (9]. 

A graph grammar is specified by a set of production 
rules. The role of rules is to replace matched subgraph 
by another one. This process depends on a specifica- 
tion on the desired embedding, this means that there 
is different ways to replace matched subgraph. 
4.1 Definition 

We use context-sensitive graph grammars. A rule of 
grammar expresses that a sub-graph of the graph can 
be collapsed into a new vertex (representing the sub- 
formula) if some conditions are verified by the involved 
tokens. Rules and grammars are defined as following: 

'Then every vertices appears in at least one edge. This is 
not a restriction : we can add a generic vertex, connected to 
every vertex with generic edges. 
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• a rule is a term V G^C where: V is a vertex, 
called "production" of rule, G is a graph, called 
"pattern" of rule, C is a finite set of graphs, called 
"context" of rule discusses in next section. 

• a grammar is a finite set of rules. 

Given a graph representing a formula, rules are in- 
tended to rewrite it by replacing sub-graphs by ver- 
tices whose values are term forms of the recognized 
sub-formulas. This process uses matching and replace- 
ment in a way that we precise below. First, we recall 
the notions of substitution and term matching : 

• a substitution is an endomorphism of r{F, V), 
i.e. an application a verifying tr/(ii, = 
f{<rtu*.^atny^f in F and all terms *n. A 
substitution a is uniquely determined by its re- 
striction a\v to the set of variables. 

• a term t matches a term f, noted t <t* \f[ there 
exists a substitution a such that at^t\ 

Matching of finite sets of terms is defined by : 

Then a rule r = V (?, C rewrites a graph Gi 
into a graph Ga, noted Gi -^r G2 iff there exists a 
substitution <r, a sub-graph G' of Gi (i.e. G' C Gi), 
such that: aG = G' ; for all graph H in the context G, 
there is no substitutionr such that r\Var{Q) = <^\VariG) 
and tH C Gi ; and G2 is obtained by collapsing G' 
hito (tV, i.e. removing in Gi all edges of G' and re- 
placing in Gi all the vertices of G' by the vertex trV. 
4.2 Contexts of rules 

One of the main problems with grammar and 
rewrite rules is the existence of ambiguities: two rules 
can rewrite an object into two distinct objects. Sup- 
pression of ambiguities can be made for example by us- 
ing priorities, case analysis on pattern of rules, or by 
Knuth-Bendix completion. These techniques hardly 
apply to our case, this is why we use contexts in 
rules: given a graph grammar which leads to ambigu- 
ities, our goal is to add ebntexts to its rules to^reiiiGve 
^thes^ ambigmtiespas automatically as^^possible. 

When two rules can apply to two sub-graphs which 
have disjoint sets of vertices, there is no ambiguity: 
applications of the two rules commute. 

Ambiguities can appear when the two patterns of 
the rules can be superposed: 

• two graphs Gi and Gj can be superposed iff there 
exist ai and 0-3 such that aid and 0-202 have a 
common vertex. We note 5(Gi,G2) the set of 
couples of such substitutions, called superposi- 
tionsof Gi andGj. 



• given two rules n^Vi^ Gi,Ci, i = 1,2, the set 
i4(ri,r2) of ambiguities of ri and is defined as 
the subset of S(Gi , G2) formed by couples {ay , a^) 
such that the two rules can apply to the graph 
aiGi Ua^Ga, i.e. V» = 1,2, Vif € G., there is 
no substitution r such that r\var(Gt) = <^i\Var{Gi) 
and tH C (TiGi U <r2G2. 

The set 5(Gi,G2) can be infinite, but "minimal" su- 
perpositions are in finite number, as shown by next 
propositions. Purst we define a pre-order on couples of 
sutetitutions : 

Definition 1 

{(^if<^2) < (0^1,^2) <^^r,a[ =ro(7i, (t[ = ro<ri 

Proposition 1 Given two graphs Gj and G2, there 
exists a finite subset So of SiGi^G^) such that: 

"ic € 5(Gi,G2), Be* € 5,0^ < c. 

We omit proofs by lack of space (see [7] for details). 

Proposition 2 There exists an unique (up to renom- 
ing of variables), minimal (for <) and finite set of 
superpositions of two graphs. 

We will note 5o(Gi,G2) this minimal set. 

Because of contexts, the set of ambiguities of two 
rules has a more complicated structure than the set 
of superpositions of thdr patterns. But the following 
property will help us to describe it: 

Proposition 3 Let r = V ^ GyC be a rule, r* = 
K 4- G,0 6e the same rule with empty context, Gi,G2 
two graphs, such that Gi -^r* Ga and Gi y^r G2* 

Let p be a substitution. 

Then pG\ -^r' pG^ and pG\ -f^r pG^* 

Let us given two rules rj and r2 with patterns Gx 
and G2, and a superposition {puct^) of S(Gi,G2). If 
we remove contexts of the rules, then both apply to the 
graph <JxG\ U <r2G2. Suppose now that (o-i, 0^2) is not 
an ambiguity of r\ and ra. Then one of the two rules 
(with their contexts) does not apply to a\G\ U craGa. 
The last proposition implies that for every substitu- 
tion r, the superposition (t o (7i,r o <r2), is not an 
ambiguity of r\ and r2. More briefly, we have then: 

Proposition 4 Ifc 6 5(Gi,G2)\A(ri,r2) andc < c* 
t/ien c' € 5(Gi,G2) \>l(ri,r2). 

This means that the complement of A{j'\^r%) in 
5(Gi,G2) has the same nice "cone" structure as 
5(Gi,G2). In particular, we have: 
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Corollary 1 // the minimal superpositions of the pat- 
terns of two rules are not amhiguities of the rules, then 
the rules have no ambiguities. 

We will exploit now this property to define an 
method that, given a grammar with ambiguities, adds 
contexts to its rules in order to obtain a new grammar 
without ambiguities. 
4.3 Construction of Contexts 

We can now give the general formulation of the 
method which removes ambiguities of context-free 
graph grammar by adding contexts. 

Let 5 = {^'ij -i^n} be a graph grammar, with n = 

Let {<rkii<^k2)*^ " ^"^v minimal super- 

positions of Gi and Gj. Because contexts are empty, 
these superpositions coincide with ambiguities. 

Suppose that for each minimal superposition (when 
two rules can be applied simultaneously) we have a 
method to choose the good rule to apply. It is equivar 
lent to give a function C such that C(i,j, A;) € {1,2}: 
if C{iyjyk) ~ 1, this means that we want to prevent 
application of rule rj in the ambiguity (a*^^ , trj'g). We 
achieve this goal by adding context <t]^iGi to rule rj. 

Doing this for all rules and all minimal superpo- 
sitions, we define the new grammar G' = {r[, 
where: 

rJ = K U {<rgGj\k^U,.aij,C{iJ,k)^2} 

By Corollary 1, we have then: 

Proposition 5 The grammar G' has no ambiguity. 

Figure 2 show a representation of a context-free 
grammar G which has two rules, one for addition z-^-t 
and one for implicit power x^. 




Figure 2: Graphs for rules and 

For rule : the first part is the description of 
'^production*' of rule is the node in which the matched 
graph will be rewrite. Second part is graph to match 
("pattern"). This graph is constituted of two edges, 
with their types (direction and weight) and nodes. 

Suppose we have the formula A? + 5, ^ven the 
graph represented by figure 3 

There is an ambiguity because we can apply the two 
rules to the data graph. If we apply r+, vertices Ay 
+ and B collapse, and we obtain a graph representing 




Figure 3: Graph for ^4^ + J5 formula 

the formula {A + B)^. If we apply ta, then we obtain 
a graph representing the formula (A^)^B. The right 
choice is clearly to apply Ta: we have to prevent the 
application of r+ in this case. 

Here are, in graphic representation, the four mini- 
mal superpositions of rule r+ and ta (data graph is a 
particular case of the first superposition) : 




Figure 4: All the overlaps for rules r+ and ta 

We have defined some general criterions to obtain 
the C function. Firstly the mathematical priority of 
the described operator. Secondly a graphical informa- 
tion. In our example, A as a greater priority than +, 
but this doesn't mean that this rule should always be 
applied before the other one. This priority is right for 
linear case. In case of A operator, there is implicit 
parenthesis on arguments and this is true for all such 
operators with arguments on different level like 2;^^+'), 

^rjilsi) • • • With these two simple criterions 

we are able to define which rule should be applied in 
each case of ambiguity. 

In our example, for the first superposition, we do 
not want to apply r+ first, then we just add the pat- 
tern of rule rA to contexts of r+. So that, r+ won't be 
applied if there is a power on its argument. 

Doing this for all superpositions, we obtain a 
context-dependent grammar This granmiar works 

well on formulas like + fe, a?*+** -f- y'^"^' etc. 
4.4 Parser 

The parsing algorithm we use is a bottom-up al- 
gorithm. TVying to simulate this global view by top- 
down parsing is possible, but we think that this ap- 
proach will lead to a combinatorial explosion and an 
exponential parsing which is generally the case in two 
dimensional parsing algorithms .in literature. 
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5 Implementation 

We use a HP ScanJet 4c scanner to scan mathemat- 
ical expressions document and save it as a binary im- 
age file. OCR used actually is a small package (about 
1500 lines of C) written in our team which is able to 
recognize printed documents by WT]^, One of our aim 
is to replace this by a more generic OCR, which would 
be able to learn from a file and it's ascii translation, 
then recognize a document in the same fonts. 

The Graph Builder and Graph Grammar package 
are currently implemented in Klone, a Common Lisp 
dialect. Advantages of Klone were useful to quickly 
develop these experimental packages on graph gram- 
mar. Graph Builder and Graph Grammar are about 
7000 lines of Klone. All run under UNIX system. 

6 Conclusion 

We have presented a method and a system to recog- 
nize scanned mathematical formulas. The system is 
composed of three clearly separated modules (OCR, 
graph builder, graph grammars and parsing). On a 
theoretical level, we use a graph grammar and we have 
define a method to remove ambiguities of grammars. 
On a practical level, we have a first implementation of 
the method, which works on various complex formulas, 
obtained from bitmap images of formulas, with good 
time complexity. Defined grammar for these formu- 
las are not trivial, using more than 50 operators, with 
many kinds of constructions: linear operators, verti- 
cal operators, 2D tree-operators, 2D cyclic-operators, 
implicit operators. For most grammars dealing with 
these constructions, we are able to remove correctly 
ambiguities, with the presented criterions. Just some 
cases need heuristics to solve. 

In future, we will focus on the two first parts of 
Ofr. the OCR component, and the graph builder. The 
main problem in graph building is to find a good trade- 
off between two extreme cases: a graph with many 
links will represent more than one formula, and then 
lead to inconsistency ; a graph with few links will not 
contain sufiicient informations to build the formula. 
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